# Vision-Language Joint Reasoning
Qwen2 VL 7B VLGuard
Apache-2.0
A multimodal vision-language model fine-tuned on the VLGuard dataset based on Qwen2-VL-7B, focusing on safety-related visual question answering tasks.
Text-to-Image English
Q
Foreshhh
24
1
Llava 13b Delta V0
Apache-2.0
LLaVA is an open-source chatbot fine-tuned with GPT-generated multimodal instruction-following data based on LLaMA/Vicuna, belonging to a Transformer-based autoregressive language model.
Text-to-Image
Transformers

L
liuhaotian
352
221
Featured Recommended AI Models